-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28669: Deadlock found when TxnStoreMutex trying to acquireLock #5585
Conversation
Based on the comments on JIRA and I have tested this patch on my local and haven't seen the error stacktrace, hence |
@dengzhhu653, before HIVE-27481 we always used
if that is changed to default, should we restore the |
...store/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnDummyMutex.java
Outdated
Show resolved
Hide resolved
public TransactionContext getNewTransaction(int propagation) { | ||
TransactionContext context = new TransactionContext(realTransactionManager.getTransaction( | ||
new DefaultTransactionDefinition(propagation)), this); | ||
public TransactionContext getNewTransaction(Transactional transactional) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not create a parameterless method and use Propagation.REQUIRED as default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this transactional
is retrieved from the method annotation,
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java#L281
each method might have its own definition, though we only care about the isolation level and propagation now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by default, the isolation is default, and we don't specify another level elsewhere.
we don't introduce the isolation()
in the TxnStore, the level can be specified via:
@Transactional(value = POOL_TX, isolation = Isolation.READ_COMMITTED, propagation = Propagation.REQUIRED, timeout = 30)
GetOpenTxnsResponse getOpenTxns() throws MetaException;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about:
public TransactionContext getNewTransaction(int propagation, int isolation) {
DefaultTransactionDefinition transactionDefinition = new DefaultTransactionDefinition(propagation);
transactionDefinition.setIsolationLevel((isolation != Isolation.DEFAULT.value()) ?
isolation : ISOLATION_READ_COMMITTED);
TransactionContext context = new TransactionContext(
realTransactionManager.getTransaction(transactionDefinition), this);
contexts.set(context);
return context;
}
public TransactionContext getNewTransaction(int propagation) {
return getNewTransaction(propagation, ISOLATION_READ_COMMITTED);
}
....
context = jdbcResource.getTransactionManager().getNewTransaction(
transactional.propagation().value(), transactional.isolation().value());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, you decided to get rid of the isolation level? that's fine, but in that case, why do we need 2nd method without the propagation arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, removed the 2nd method
...c/main/java/org/apache/hadoop/hive/metastore/txn/jdbc/functions/AbortCompactionFunction.java
Outdated
Show resolved
Hide resolved
...e/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/leader/LeaderElection.java
Outdated
Show resolved
Hide resolved
I have made the default as the |
0390bf3
to
d4fedb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1, pending tests
Quality Gate passedIssues Measures |
What changes were proposed in this pull request?
Resolving the Deadlock when the back db is MySQL
Why are the changes needed?
By default MySQL default isolation level is REPEATABLE-READ,
for update
in this isolation will hold the gap lock, if multiple clients are trying tofor update
and theninsert
into the same gap, it cloud cause the deadlock.Does this PR introduce any user-facing change?
A new
hive.metastore.have.multiple.leaders
, if it's false, then the same housekeeping tasks will not leverage the db mutex to block each other.Is the change a dependency upgrade?
No
How was this patch tested?
Testing the PR locally, querying from mysql information_schema.INNODB_TRX table, showing that trx_isolation_level of the new trx is 'READ COMMITTED'